home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Chip 1998 September
/
CHIP Eylül 1998.iso
/
Slackwar
/
docs
/
Root-RAID-HOWTO
< prev
next >
Wrap
Text File
|
1997-02-23
|
52KB
|
1,343 lines
Root RAID HOWTO cookbook
Michael A. Robinton, michael@bzs.org <mailto:michael@bzs.org>
v1.06, 12 February 1998
This document provides a cookbook for creating a root mounted raid
filesystem and companion fallback rescue system using linux initrd.
There are complete step-by-step instruction for a raid1 md0 device.
Each step is accompanied by an explanation of it's purpose. This pro¡
cedure may be used for all the other raid structures with minor modi¡
fications.
1. Introduction
The reader is assumed to be familiar with the various types of raid
implementations, their advantages and drawbacks. This is not a
tutorial, just a set of instructions on how to implement root mounted
raid on a linux system. All of the information necessary to become
familiar with linux raid is listed here directly or by reference,
please read it before send e-mail questions.
1.1. Where to get Up-to-date copies of this document.
Root-RAID-HOWTO
Available in LaTeX (for DVI and PostScript), plain text, and HTML.
sunsite.unc.edu/mdw/HOWTO/
<http://sunsite.unc.edu/mdw/HOWTO/>
Available in SGML and HTML.
ftp.bizsystems.com/pub/raid/
<ftp://ftp.bizsystems.com/pub/raid/>
1.2. Bugs
As of this writing, the problem of stopping a root mounted RAID device
has not yet been solved in a satisfactory way. A work-around proposed
by Ed Welbon and implemented by Bohumil Chalupa is incorporated into
this document which eliminates the need for a long ckraid at each boot
for raid1 and raid5 devices. Without the workaround, it is necessary
to ckraid the md device each time the system is re-booted. On a large
array this can cause a severe availability performance degradation.
On my 6 gig RAID1 device running on a Pentium 166 with 128 megs of
ram, it takes well over half an hour to ckraid :-( after each re-boot.
The workaround stores the status of the array at shutdown on the real
boot device and compares it to a reference status placed there when
the system is first built. If the status's match at reboot, the
superblock on the array is rebuilt on the next boot, otherwise the
operator is notified of the status error and the rescue system is left
running with all the raid tools available.
Rebuilding the superblock causes the system to ignore that the array
was powered down without mdstop by marking all the drives as OK, as if
nothing happened. This only works if all the drives are OK at
shutdown. If the array was operating with a bad drive, the operator
must remove the bad drive prior to restarting the md device or the
data can be corrupted.
None of this applies to raid0 which does not have to be mdstopped
before shutdown.
Final proposed solutions to this problem include a finalrd similar to
initrd, and mdrootstop which writes the clean flags to the array
during shutdown when it is mounted read only. I am sure there are
others.
In the mean time, the problem has been by-passed for now Please let me
know when this problem is solved more cleanly!!!
1.3. Acknowledgements
The writings and e-mail from the following individuals helped to make
this document possible. Many of the ideas were stolen from the
helpful work of others, I have just tried to put it all in COOKBOOK
form so that it is straightforward to use. My thanks to:
╖ Linas Vepstas <mailto:linas@linas.org>
for the RAID howto that explained most of this to me.
╖ Gadi Oxman <mailto:gadio@netvision.net.il>
for answering my dumb 'newbie' questions.
╖ Ed Welbon <mailto:welbon@bga.com>
for the execellent initrd.md package that inspired me to write
this.
╖ Bohumil Chalupa <mailto:bochal@apollo.karlov.mff.cuni.cz> for
implementing the re-boot 'workaround' that allows root-mounted-raid
to work in a production environment.
╖ and many others who contributed to this work in one way or another.
1.4. Copyright Notice
This document is GNU copyleft by Michael Robinton michael@bzs.org
<mailto:michael@bzs.org>.
Permission to use, copy, distribute this document for any purpose is
hereby granted, provided that the author's / editor's name and this
notice appear in all copies and/or supporting documents; and that an
unmodified version of this document is made freely available. This
document is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY, either expressed or implied. While every effort
has been taken to ensure the accuracy of the information documented
herein, the author / editor / maintainer assumes NO RESPONSIBILITY for
any errors, or for any damages, direct or consequential, as a result
of the use of the information documented herein.
2. What you need BEFORE YOU START
The packages you need and the documentation that answers the most
common questions about setting up and running raid are listed below.
Please review them throughly.
2.1. Required Packages
You need to obtain the most recent versions of these packages.
╖ a linux kernel that supports raid, initrd and /dev/loopx
I used linux-2.0.32
<ftp://sunsite.unc.edu/pub/Linux/kernel/> from sunsite
╖ raid145-971022-2.0.31
<ftp://ftp.kernel.org/pub/linux/daemons/raid/> patch adds support
for raid1/4/5
╖ raidtools-pre3-0.42 <ftp://ftp.kernel.org/pub/linux/daemons/raid/>
tools to create and maintain raid devices (documentation too).
╖ linuxthreads-0.71
<ftp://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy> required
threads package. Use ftp, browser doesn't work
ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy
╖ A Linux distribution, ready to install.
I used Slackware-3.4 <ftp://ftp.cdrom.com/pub/linux> avail¡
able everywhere.
The detailed instructions in this document are based on the above
packages. If the packages have been updated or you use a different
linux distribution, you may have to modify the procedures you find
here.
The patches, tool assortment, etc... may vary with 2.1 kernels.
Please check the most recent documentation at:
ftp.kernel.org/pub/linux/daemons/raid/
<ftp://ftp.kernel.org/pub/linux/daemons/raid/>
2.2. Other similar implementations.
I chose to include in the kernel all of the pieces necessary to run
from boot without loading any modules. My kernel image is a little
over 300k compressed.
Take a look at Ed Welbon's <mailto:welbon@bga.com> initrd.md.tar.gz
for another way to make a bootable raid device. He uses loadable
modules. A look at his concise scripts will show you how it is done
if you need a very small kernel with modules.
http://www.realtime.net/~welbon/initrd.md.tar.gz
<http://www.realtime.net/~welbon/initrd.md.tar.gz>
2.3. Documentation -- Recommended Reading
Please read:
/usr/src/linux/Documentation/initrd.txt
as well as the documentation and man pages that accompany the
raidtools set. In particular, read man mdadd as well as the
QuickStart.RAID document included in the raidtools package.
2.4. RAID resources
╖ sunsite.unc.edu/mdw/HOWTO/mini/Software-RAID
<http://sunsite.unc.edu/mdw/HOWTO/mini/Software-RAID>
╖ www.ssc.com/lg/issue17/raid.html
<http://www.ssc.com/lg/issue17/raid.html>
╖ linas.org/linux/raid.html <http://linas.org/linux/raid.html>
╖ ftp.kernel.org/pub/linux/daemons/raid/
<ftp://ftp.kernel.org/pub/linux/daemons/raid/>
╖ www.realtime.net/~welbon/initrd.md.tar.gz
<http://www.realtime.net/~welbon/initrd.md.tar.gz>
╖ luthien.nuclecu.unam.mx/~miguel/raid/
<http://luthien.nuclecu.unam.mx/~miguel/raid/>
Mailing lists can be joined at:
╖ majordomo@nuclecu.unam.mx <mailto:majordomo@nuclecu.unam.mx> send a
message to subscribe raiddev
send mail to: raiddev@nuclecu.unam.mx
<mailto:raiddev@nuclecu.unam.mx>
╖ majordomo@vger.rutgers.edu <mailto:majordomo@vger.rutgers.edu> send
a message to subscribe linux-raid
send mail to: linux-raid@vger.rutgers.edu <mailto:linux-
raid@vger.rutgers.edu> (this seems to be the most active list)
3. initrd Cookbook for root mounted RAID
This is the procedure to make an 'initrd' ramdisk with rescue tools
for raid.
Specifically, this document referrs to a RAID1 implementation, however
it is generally applicable to any raid scheme with a root mounted raid
device.
3.1. Security Reminder
The rescue file system may be used stand alone. Should your raid array
fail to mount, you are left with the rescue system mounted and
running. TAKE THE APPROPRIATE SECURITY PRECAUTIONS!!!
3.2. Build the Kernel and Raid Tools
The first thing that must be done is to patch and build your kernel
and become familiar with the raid tools. Configure, mount and test
your raid device(s). The details of how to do this are included in the
raidtools package and briefly reviewed later in this document.
3.3. Build the initrd Rescue and Boot filesystem
I used the Slackware-3.4 distribution to build both the Rescue/Boot
filesystem and the filesystem for the production machine. Any linux
distribution should work fine. If you use a different distribution,
review the Slackware specific portion of this procedure and modify it
to suit your needs.
You can download the Slackware distribution from:
ftp.cdrom.com/pub/linux/ <ftp://ftp.cdrom.com/pub/linux/>
If you already have Slackware, you only need to download new
I use loadlin to boot the kernel image and ramdisk from a dos
partition. I chose to create a minimum ramdisk system using the
Slackware 'setup' script followed by installing the 'linuxthreads'
package and 'raidtools' over the clean Slackware installation on my
ramdisk. I used the identical procedure to build the production
system. So the rescue and production systems are very similar.
This installation process gives me a 'bare' system (save a copy of the
file) to which I overlay
/lib/modules/2.x.x......
/etc .... with a modified fstab
/etc/rc.d
/dev/md*
from my current system to customize it for the particular kernel and
machine that it is/will-be running on.
This makes the boot/rescue system the same system that is running on
the root mounted raid device, just skinnyed down a bit, while allowing
the library, etc... revisions to always be current.
3.4. Start the STEP by STEP instructions
From the root home directory (/root):
cd /root
mkdir raidboot
cd raidboot
Create a mountpoints to work on
mkdir mnt
mkdir mnt2
Make a file large enough to do the file system install. This will be a
lot larger than the final rescue file system. I chose 24 megs since
16 megs is not large enough
dd if=/dev/zero of=build bs=1024k count=24
associate the file with a loop device and generate an ext2 file system
on the file
losetup /dev/loop0 build
mke2fs -v -m0 -L initrd /dev/loop0
mount /dev/loop0 mnt
3.5. Install the distribution - Slackware Specific
``...skip Slackware Specific stuff'' and go to next section.
Now that an empty filesystem is created and mounted, run "setup".
Specify /root/raidboot/mnt
as the 'target'. The source is whatever you normally install from.
Select the packages you wish to install and proceed but DO NOT
configure.
Choose 'EXPERT' prompting mode.
I chose 'A', 'AP, and 'N' installing only the minimum to run the
system plus an editor I am familiar with (vi, jed, joe) that is
reasonably compact.
lqqqqqqqq SELECTING PACKAGES FROM SERIES A (BASE LINUX SYSTEM) qqqqqqqqk
x lqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk x
x x [X] aaa_base Basic filesystem, shell, and utils - REQUIRED x x
x x [X] bash GNU bash-1.14.7 shell - REQUIRED x x
x x [X] devs Device files found in /dev - REQUIRED x x
x x [X] etc System config files & utilities - REQUIRED x x
x x [X] shadow Shadow password suite - REQUIRED x x
x x [ ] ide Linux 2.0.30 no SCSI (YOU NEED 1 KERNEL) x x
x x [ ] scsi Linux 2.0.30 with SCSI (YOU NEED 1 KERNEL) x x
x x [ ] modules Modular Linux device drivers x x
x x [ ] scsimods Loadable SCSI device drivers x x
x x [X] hdsetup Slackware setup scripts - REQUIRED x x
x x [ ] lilo Boots Linux (not UMSDOS), DOS, OS/2, etc. x x
x x [ ] bsdlpr BSD lpr - printer spooling system x x
x x [ ] loadlin Boots Linux (UMSDOS too!) from MS-DOS x x
x x [ ] pnp Plug'n'Play configuration tool x x
x x [ ] umsprogs Utilities needed to use the UMSDOS filesystem x x
x x [X] sysvinit System V-like INIT programs - REQUIRED x x
x x [X] bin GNU fileutils 3.12, elvis, etc. - REQUIRED x x
x x [X] ldso Dynamic linker/loader - REQUIRED x x
x x [ ] ibcs2 Runs SCO/SysVr4 binaries x x
x x [X] less A text pager utility - REQUIRED x x
x x [ ] pcmcia PCMCIA card services support x x
x x [ ] getty Getty_ps 2.0.7e - OPTIONAL x x
x x [X] gzip The GNU zip compression - REQUIRED x x
x x [X] ps Displays process info - REQUIRED x x
x x [X] aoutlibs a.out shared libs - RECOMMENDED x x
x x [X] elflibs The ELF shared C libraries - REQUIRED x x
x x [X] util Util-linux utilities - REQUIRED x x
x x [ ] minicom Serial transfer and modem comm package x x
x x [ ] cpio The GNU cpio backup/archiving utility x x
x x [X] e2fsbn Utilities for the ext2 file system x x
x x [X] find GNU findutils 4.1 x x
x x [X] grep GNU grep 2.0 x x
x x [ ] kbd Change keyboard mappings x x
x x [X] gpm Cut and paste text with your mouse x x
x x [X] sh_utils GNU sh-utils 1.16 - REQUIRED x x
x x [X] sysklogd Logs system and kernel messages x x
x x [X] tar GNU tar 1.12 - REQUIRED x x
x x [ ] tcsh Extended C shell version 6.07 x x
x x [X] txtutils GNU textutils-1.22 - REQUIRED x x
x x [ ] zoneinfo Configures your time zone x x
x mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj x
From the 'AP series, I use only 'JOE', and editor I like, and 'MC' a
small and useful file management tool. You choose the utilities you
will need on your system.
lqqqqqqqqq SELECTING PACKAGES FROM SERIES AP (APPLICATIONS) qqqqqqqqqk
x x [ ] ispell The International version of ispell x x
x x [ ] jove Jonathan's Own Version of Emacs text editor x x
x x [ ] manpgs More man pages (online documentation) x x
x x [ ] diff GNU diffutils x x
x x [ ] sudo Allow special users limited root access x x
x x [ ] ghostscr GNU Ghostscript version 3.33 x x
x x [ ] gsfonts1 Ghostscript fonts (part one) x x
x x [ ] gsfonts2 Ghostscript fonts (part two) x x
x x [ ] gsfonts3 Ghostscript fonts (part three) x x
x x [ ] jed JED programmer's editor x x
x x [X] joe joe text editor, version 2.8 x x
x x [ ] jpeg JPEG image compression utilities x x
x x [ ] bc GNU bc - arbitrary precision math language x x
x x [ ] workbone a text-based audio CD player x x
x x [X] mc The Midnight Commander file manager x x
x x [ ] mt_st mt ported from BSD - controls tape drive x x
x x [ ] groff GNU troff document formatting system x x
x x [ ] quota User disk quota utilities x x
x x [ ] sc The 'sc' spreadsheet x x
x x [ ] texinfo GNU texinfo documentation system x x
x x [ ] vim Improved vi clone x x
x x [ ] ash A small /bin/sh type shell - 62K x x
x x [ ] zsh Zsh - a custom *nix shell x x
x mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj x
From the 'N' package I only loaded TCPIP. This isn't really neces¡
sary, but is very handy and allows access to the network while working
on a repair or update with the root raid array dismounted. TCPIP also
contains 'biff' which is used by some of the applications in 'A'. If
you don't install 'N' you might want to install the biff package any¡
way.
lqqqq SELECTING PACKAGES FROM SERIES N (NETWORK/NEWS/MAIL/UUCP) qqqqqk
x lqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk x
x x [ ] apache Apache WWW (HTTP) server x x
x x [ ] procmail Mail delivery/filtering utility x x
x x [ ] dip Handles SLIP/CSLIP connections x x
x x [ ] ppp Point-to-point protocol x x
x x [ ] mailx The mailx mailer x x
x x [X] tcpip TCP/IP networking programs x x
x x [ ] bind Berkeley Internet Name Domain server x x
x x [ ] rdist Remote file distribution utility x x
x x [ ] lynx Text-based World Wide Web browser x x
x x [ ] uucp Taylor UUCP 1.06.1 with HDB && Taylor configs x x
x x [ ] elm Menu-driven user mail program x x
x x [ ] pine Pine menu-driven mail program x x
x x [ ] sendmail The sendmail mail transport agent x x
x x [ ] metamail Metamail multimedia mail extensions x x
x x [ ] smailcfg Extra configuration files for sendmail x x
x x [ ] cnews Spools and transmits Usenet news x x
x x [ ] inn InterNetNews news transport system x x
x x [ ] tin The 'tin' news reader (local or NNTP) x x
x x [ ] trn 'trn' for /var/spool/news x x
x x [ ] trn-nntp 'trn' for NNTP (install 1 'trn' maximum) x x
x x [ ] nn-spool 'nn' for /var/spool/news x x
x x [ ] nn-nntp 'nn' for NNTP (install 1 'nn' maximum) x x
x x [ ] netpipes Network pipe utilities x x
x mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj x
With the installation complete, say no to everything else (no to all
configuration requests) and exit the script.
3.6. Install linux pthreads
Now you must install the 'linuxthreads-0.71' library. I have included
this diff for the linuxthreads Makefile rather than explain the
details of the installation by hand. Save the original Makefile,
apply the diff and then:
cd /usr/src/linuxthreads-0.71
patch
make
make install
-------------------diff Makefile.old Makefile.raid-----------------
2a3,13
> # If you are building "linuxthreads" for installation on a mount
> # point which is not the "root" partition, redefine 'BUILDIR' to
> # the mount point to use as the "root" directory
> # You may wish to do this if you are building an 'initial ram disk'
> # such as used with bootable root raid devices.
> # REQUIRES ldconfig version 1.9.5 or better
> # do ldconfig -v to check
> #
> BUILDIR=/root/raidboot/mnt
> #BUILDIR=
>
81,82c92,93
< install pthread.h $(INCLUDEDIR)/pthread.h
< install semaphore.h $(INCLUDEDIR)/semaphore.h
---
> install pthread.h $(BUILDIR)$(INCLUDEDIR)/pthread.h
> install semaphore.h $(BUILDIR)$(INCLUDEDIR)/semaphore.h
84c95
< test -f /usr/include/sched.h || install sched.h $(INCLUDEDIR)/sched.h
---
> test -f $(BUILDIR)/usr/include/sched.h || install sched.h $(BUILDIR)$(INCLUDEDIR)/sched.h
86,89c97,103
< install $(LIB) $(LIBDIR)/$(LIB)
< install $(SHLIB) $(SHAREDLIBDIR)/$(SHLIB)
< rm -f $(LIBDIR)/$(SHLIB0)
< ln -s $(SHAREDLIBDIR)/$(SHLIB) $(LIBDIR)/$(SHLIB0)
---
> install $(LIB) $(BUILDIR)$(LIBDIR)/$(LIB)
> install $(SHLIB) $(BUILDIR)$(SHAREDLIBDIR)/$(SHLIB)
> rm -f $(BUILDIR)$(LIBDIR)/$(SHLIB0)
> ln -s $(SHAREDLIBDIR)/$(SHLIB) $(BUILDIR)$(LIBDIR)/$(SHLIB0)
> ifneq ($(BUILDIR),)
> ldconfig -r ${BUILDIR} -n $(SHAREDLIBDIR)
> else
91c105,106
< cd man; $(MAKE) MANDIR=$(MANDIR) install
---
> endif
> cd man; $(MAKE) MANDIR=$(BUILDIR)$(MANDIR) install
3.7. Install Raid Tools
The next step is the installation of the raid tools. raidtools-0.42
You must run the "configure" script to point the Makefile at the build
directory for the ramdisk files
cd /usr/src/raidtools-0.42
configure --sbindir=/root/raidboot/mnt/sbin --prefix=/root/raidboot/mnt/usr
make
make install
Now!! the Makefile for install is not quite right so do the following
to clean up. This will be fixed in future releases so that the re-
linking will not be necessary.
Fix the make install error
The file links specified in the Makefile at 'LINKS' must be removed
and re-linked to operate properly.
cd /root/raidboot/mnt/sbin
ln -fs mdadd mdrun
ln -fs mdadd mdstop
3.8. Remove un-needed directories and files from new filesystem.
Delete the following directories from filesystem (CAUTION DON'T DELETE
FROM YOUR RUNNING SYSTEM) it's easy to do, guess how I found out!!!
cd /root/raidboot/mnt
rm -r home/ftp/*
rm -r lost+found
rm -r usr/doc
rm -r usr/info
rm -r usr/local/man
rm -r usr/man
rm -r usr/openwin
rm -r usr/share/locale
rm -r usr/X*
rm -r var/man
rm -r var/log/packages
rm -r var/log/setup
rm -r var/log/disk_contents
3.9. Create /dev/mdx
The last step simply copies the /dev/md* devices from the current file
system onto the rescue file system. You could create these with
mknode.
cp -a /dev/md* /root/raidboot/mnt/dev
3.10. Create a bare filesystem suitable for initrd
Now you have a clean re-useable filesystem ready for customization.
Once customized, this file system can be used for rescue should the
raid device(s) become corrupted and the raid tools needed to fix them.
It will also be used to boot and root-mount the raid device by adding
the linuxrc file which will be discussed next.
Copy the file system to a smaller device for the initrd file, 16 megs
should be large enough.
Create the smaller file system and mount it
cd /root/raidboot
dd if=/dev/zero of=bare.fs bs=1024k count=16
associate the file with a loop device and generate a ext2 file system
on the file
losetup /dev/loop1 bare.fs
mke2fs -v -m0 -L initrd /dev/loop1
mount /dev/loop1 mnt2
Copy the 'build' file system to 'bare.fs'
cp -a mnt/* mnt2
Save the 'bare.fs' system before customization so later update is
easy. The 'build' file system is no longer needed and may be deleted.
cd /root/raidboot
umount mnt
umount mnt2
losetup -d /dev/loop0
losetup -d /dev/loop1
rm build
cp bare.fs rescue
gzip -9 bare.fs
3.10.1. Create the BOOT/RESCUE initrd filesystem
Now copy the system dependent items that match the kernel from the
development platform, or you can manually modify the files in the
rescue file system to match your target system.
losetup /dev/loop0 rescue
mount /dev/loop0 mnt
Make sure your etc directory is clean of *~, core and log files. The
next 2 commands creates some warning messages, ignore them.
cp -dp /etc/* mnt/etc
cp -dp /etc/rc.d/* mnt/etc/rc.d
mkdir mnt/lib/modules
cp -a /lib/modules/2.x.x mnt/lib/modules <--- your current 2.x.x
Edit the following files to correct them for your rescue system.
cd mnt
Non-network
etc/fstab comment out the mount of root and raid devices.
etc/mdtab should work OK
Network
etc/hosts
etc/resolv.conf
etc/hosts.equiv and related files
etc/rc.d/rc.inet1 correct ip#, mask, gateway, etc...
etc/rc.d/rc.S remove entire section on file system status
from:
# Test to see if the root partition isread-only
to but not including:
# remove /etc/mtab* so that mount will .....
This avoids the annoying warning that
the ramdisk is mounted rw.
etc/rc.d/rc.xxxxx others as required, see later on in this doc
root/.rhosts if present
home/xxxx/xxxx others as required
WARNING: The above procedure moves your password and shadow
files onto the rescue disk!!!!!
WARNING: You may not wish to do this for security reasons.
Create any directories for mounting /dev/disk... as may be required
that are unique to your system. Mine need:
cd /root/raidboot/mnt <--- initrd root
mkdir dosa dos partition mount point
mkdir dosc dos mirror mount point
The rescue file system is complete!
You will note upon examination of the files in the rescue file system,
that there are still many files that could be deleted. I have not
done this since it would overly complicate this procedure and most
raid systems have adequate disk and memory. If you wish to skinny
down the file system, go to it!
3.10.2. Making initrd boot the RAID device - linuxrc
To make the rescue disk boot the raid device, you need only copy the
executable script file:
linuxrc
to the root of the device.
---------------------- linuxrc --------------------
#!/bin/sh
# ver 1.07 2-12-98
# mount the proc file system
/bin/mount /proc
# This may vary for your system.
# Mount the dos partitions, try both
# in case one disk is dead
/bin/mount /dosa
/bin/mount /dosc
# Set a flag in case the raid status file is not found
# then check both drives for the status file
RAIDOWN="raidstat.ro not found"
/bin/echo "Reading md0 shutdown status."
if [ -f /dosa/linux/raidstat.ro ]; then
RAIDOWN=`/bin/cat /dosa/linux/raidstat.ro`
RAIDREF=`/bin/cat /dosc/linux/raidgood.ref`
else
if [ -f /dosc/linux/raidstat.ro ]; then
RAIDOWN=`/bin/cat /dosc/linux/raidstat.ro`
RAIDREF=`/bin/cat /dosc/linux/raidgood.ref`
fi
fi
# Test for a clean shutdown with all disks operational
if [ "${RAIDOWN} != ${RAIDREF}" ]; then
echo "ERROR ${RAIDOWN}"
# Use the next 2 lines to BAIL OUT and leave rescue running
/bin/echo 0x100>/proc/sys/kernel/real-root-dev
exit # leaving the error files in dosa/linux,etc...
fi
# The raid array is clean, proceed by removing
# status file and writing a clean superblock
/bin/rm /dosa/linux/raidstat.ro
/bin/rm /dosc/linux/raidstat.ro
/sbin/mkraid /etc/raid1.conf -f --only-superblock
/bin/umount /dosa
/bin/umount /dosc
# Mount raid array
echo "Mounting md0, root filesystem"
/sbin/mdadd -ar
# If there are errors - BAIL OUT and leave rescue running
if [ $? -ne 0 ]; then
echo "RAID device has errors"
# Use the next 3 lines to BAIL OUT
/bin/rm /etc/mtab # remove bad mtab
/bin/echo 0x100>/proc/sys/kernel/real-root-dev
exit
fi
# else tell the kernel to switch to /dev/md0 as the /root device
# The 0x900 value the device number calculated by:
# 256*major_device_number + minor_device number
/bin/echo 0x900>/proc/sys/kernel/real-root-dev
# umount /proc to deallocate initrd device ram space
/bin/umount /proc
/bin/echo "/dev/md0 mounted as root"
exit
#------------------ end linuxrc ----------------------
Add 'linuxrc' to initrd boot device
cd /root/raidboot
chmod 777 linuxrc
cp -p linuxrc mnt
3.11. Modifying the rc-scripts for SHUTDOWN
To complete the installation, modify the rc scripts to save the md
status to the real root device when shutdown occurs.
In slackware this is rc.0 -> rc.6
I have modified Bohumil Chalupa's raid stop work-around slightly. His
original solution is presented in ``Appendix A''.
Since there are no linux partitions left on the production system
except md0, the dos partitions are used to store the raidOK readonly
status. I chose to write a file to each dos partition containing the
status of the md array at shutdown and signifying that the md device
has been remounted RO. This allows the system to be fail safe when one
of the hard drives dies.
I modified my rc.6 script to attempt dismount of the root raid1 array
and any other raid device in mdtab. You may need slightly different
scripts, but the basics should be the same. The complete rc.6 file is
shown in ``Appendix B''.
To capture the raid array shutdown status, just before the file
systems are dismounted insert:
RAIDSTATUS=`/bin/cat /proc/mdstat | /usr/bin/grep md0`
After all the file systems are dismounted (the root file system
# root device remains mounted RO
# mount dos file systems RW
mount -n -o remount,ro /
echo "Writing RAID read-only boot FLAG(s)."
mount -n /dosa
mount -n /dosc
# create raid mounted RO flag in duplicate
# containing the shutdown status of the raid array
echo ${RAIDSTATUS} > /dosa/linux/raidstat.ro
echo ${RAIDSTATUS} > /dosc/linux/raidstat.ro
umount -n /dosa
umount -n /dosc
# Stop all the raid arrays (except root)
echo "Stopping raid"
mdstop -a
This will cleanly stop all raid devices except root. Root status is
passed to the next boot in raidstat.ro.
Copy the rc file to your new raid array, the rescue file system that
is still mounted on /root/raidboot/mnt and the development system if
it is on the same machine.
Modify rescue etc/fstab as needed and make sure rescue mdtab is
correct.
Now copy the rescue disk to your dos partition and everything should
be ready to boot the raid device as root.
umount mnt
losetup -d /dev/loop0
gzip -9 rescue
Copy rescue.gz to your dos partition.
All that remains is to test the new file system by rebooting. See
the loadlin parameters in my dos linux.bat file next.
3.12. Setting up loadlin boot for RESCUE and RAID
The disks I chose for my system are much larger than those manageable
by lilo. Therefore, I used loadlin to boot the system from a small dos
partition with a mirror (copy) on the companion disk.
My dos root system contains a small editor among the utilities so I
can modify the boot parameters of loadlin if necessary, allowing me to
reboot the linux system on my swap disk while testing.
The dos system contains this tree for linux"
c:\linux.bat
c:\linux\loadlin.exe
c:\linux\zimage
c:\linux\rescue.gz
c:\linux\raidgood.ref
c:\linux\raidstat.ro (only at shutdown)
---------------------- linux.bat ---------------------------
rem Sample DOS batch file to boot Linux.
rem Start the LOADLIN process:
rem c:\linux\loadlin c:\linux\zimage root=/dev/ram0 ro ramdisk_size=16384 initrd=c:\linux\rescue.gz mem=131072k
c:\linux\loadlin c:\linux\zimage root=/dev/md0 ro ramdisk_size=16384 initrd=c:\linux\rescue.gz mem=131072k
rem -- this is my development system -- it goes away later
rem c:\linux\loadlin c:\linux\zimage root=/dev/hda3 ro noinitrd mem=131072k
------------------------------------------------------------
***** >> NOTE!! the only difference between forcing the rescue system to
run and the raid device mounting, is the loadlin parameter
root=/dev/ram0 for the rescue system
root=/dev/md0 for RAID
With root=/dev/ram0 the RAID device will not mount
and the rescue system will run unconditionally.
If the RAID array fails, the rescue system is left mounted and running
(this seems to fail sometimes, I don't know why, it works when the
reset button is pushed but does not work with 'shutdown -r now').
4. Configuring the Production RAID system.
4.1. System specs.
Motherboard: Iwill P55TU dual ide + adaptec scsi
Processor: Intel P200
Disks: 2 ea. Maxtor 7 gig eide
The disk drives are designated by linux as 'hda' and 'hdc'
4.2. Partitioning the hard drives.
Since testing a large root mountable RAID array is difficult because
of the re-boot problem, I re-partitioned my swap space to include a
smaller RAID partition for testing purposes. You may find this
helpful.
<bf/DEVELOPMENT SYSTEM/
/dev/hda1 dos 16meg
* /dev/hda2 extended 126m
/dev/hda3 linux 126m root partition during development
/dev/hda4 linux 6+gig raid1
* /dev/hda5 linux 26m test raid1
* /dev/hda6 linux swap 100m
/dev/hdc1 is simply an exact copy of hda1 so the
partion can be made active if hda fails
* /dev/hdc2 extended 126m
/dev/hdc3 linux 126m /usr/src during development
/dev/hdc4 linux 6+gig raid1 mirror
* /dev/hdc5 linux 26m test raid1 mirror
* /dev/hdc6 linux swap 100m
<bf/PRODUCTION SYSTEM/
/dev/hda1 dos 16meg
/dev/hda2 linux swap 126m
/dev/hda3 linux swap 126m
/dev/hda4 linux 6+gig raid1
/dev/hdc1 is simply an exact copy of hda1
/dev/hdc2 linux swap 126m
/dev/hdc3 linux swap 126m
/dev/hdc4 linux 6+gig raid1 mirror
The hdx3 partitions were switched to 'swap' after developing this
utility. I could have done it on another machine, however, the
libraries and kernels are all about a year or more out of date on my
other linux boxes and I preferred to build it on the target machine.
I chose to partition this way and use lodlin rather than lilo because
1. The main partition (6 gig) is to large to accomodate booting with
lilo alone and would have required an additional smaller partition
located within the first 1024 disk addresses.
2. In the event that one drive fails catastrophically, the system must
continue to run and be bootable with minimum effort and NO data
loss.
╖ If either hard drive fails, the boot will abort, and the rescue
system will run. Examination of the screen message or
/dosx/linux/raidstat.ro will tell the operator the status of the
failed array.
╖ If hda fails, the dos partition on hdc must be made 'active' and
the bios must recognize hdc as the boot device or it must be
physically be moved to the hda position by re-cableing. The raid
system can then be made active again by removing the failed drive
and issuing:
"/sbin/mkraid /etc/raid1.conf -f --only-superblock"
to rebuild the remaining superblock.
╖ Once this is done, then
mdadd -ar
╖ Examine the status of the array to verify that everything is OK
then replace the good array reference with the current status until
the failed disk can be repaired or replaced.
cat /proc/mdstat | grep md0 > /dosa/linux/raidgood.ref
shutdown -r now
to do a clean reboot, and the system is up again.
5. Building the RAID file system.
This description is for my RAID1 system described in the system specs.
Your system may have a different RAID architecture, so modify as
appropriate. Please read the man pages and QuickStart.RAID that come
with the raidtools-0.42 My /etc/raid1.conf contains:
# raid-1 configuration
raiddev /dev/md0
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
device /dev/hda4
raid-disk 0
device /dev/hdc4
raid-disk 1
5.1. Step by Step procedures for building production RAID file sys¡
tem.
For my RAID1 system I did a complete install of:
Slackware-3.4
linuxthreads-0.71
raidtools-0.42
linux-2.0.32 with raid145 patch
Create and format the raid device.
mkraid /etc/raid1.conf
mdcreate raid1 /dev/md0 /dev/hda4 /dev/hdc4
mdadd -ar
mke2fs /dev/md0
mkdir /md
mount -t ext2 /dev/md0 /md
Create the reference files that reboot will use, this may be different
on your system.
cat /proc/mdstat | grep md0 > /dosa/linux/raidgood.ref
cat /proc/mdstat | grep md0 > /dosc/linux/raidgood.ref
Use Slackware-3.4 or another distribution to build your OS
setup
Specify '/md' as the target, and the source whatever your normally
use. Select and install the disksets of interest except for the ker¡
nel. Configure the system, but skip the section on lilo and kernel
booting. Exit setup.
Install 'pthreads'
cd /usr/src/linuxthreads-0.71
edit the Makefile and specify
BUILDIR=/md
make
make install
Install 'raidtools'
cd /usr/src/raidtools-0.42
configure --sbindir=/md/sbin --prefix=/md/usr
fix the raidtools make install error
cd /md/sbin
rm mdrun
rm mdstop
ln -s mdadd mdrun
ln -s mdadd mdstop
Create /dev/mdx
cp -a /dev/md* /md/dev
Add the system configuration from the current system (ignore errors).
cp -dp /etc/* mnt/etc
cp -dp /etc/rc.d/* mnt/etc/rc.d (include the new rc.6)
mkdir mnt/lib/modules
cp -a /lib/modules/2.x.x mnt/lib/modules <--- your current 2.x.x
Edit the following files to correct them for your file system
cd /md
Non-network
etc/fstab correct for real root and raid devices.
etc/mdtab should work OK
Network
etc/hosts
etc/resolv.conf
etc/hosts.equiv and related files
etc/rc.d/rc.inet1 correct ip#, mask, gateway, etc...
etc/rc.d/rc.S remove entire section on file system status
from:
# Test to see if the root partition isread-only
to but not including:
# remove /etc/mtab* so that mount will .....
This avoids the annoying warning that
the ramdisk is mounted rw.
etc/rc.d/rc.xxxxx others as required
root/.rhosts if present
home/xxxx/xxxx others as required
WARNING: The above procedure moves your password and shadow
files onto the new file system!!!!!
WARNING: You may not wish to do this for security reasons.
Create any directories for mounting /dev/disk... as may be required
that are unique to your system. Mine need:
cd /md <--- new file system root
mkdir dosa dos partition mount point
mkdir dosc dos mirror mount point
The new file system is complete. Make sure and save the md reference
status to the 'real' root device and you are ready to boot.
mount the dos partitions on dosa and dosc
cat /proc/mdstat | grep md0 > /dosa/linux/raidgood.ref
cat /proc/mdstat | grep md0 > /dosc/linux/raidgood.ref
mdstop /dev/md0
6. One last thought.
Remember that an expert is someone who knows at least 1% more than you
do about a subject. Bear this in mind when you e-mail me for help.
I'll try, but I've only done this once!
Michael Robinton Michael@bzs.org <mailto:michael@bzs.org>
7. Appendix A. - Bohumil Chalupa's md0 shutdown
Bohumil Chalupa's post to the linux raid list on the work around for
the raid1 + 5 mdstop problem. His solution does not address the
possibility of the raid device being corrupt at shutdown. So I have
added a simple status comparison to a good reference status at boot.
This allows the operator to intervene if something is wrong with a
disk in the array. The description of this is in the main body of this
document.
> From: Bohumil Chalupa <bochal@apollo.karlov.mff.cuni.cz>
>
> I can now boot initrd and use linuxrc to start the RAID1 array,
> then successfully switch root to /dev/md0.
>
> I don't know, however, any way how to cleanly _stop_ the array.
Well. I have to answer myself :-)
> Date: Mon, 29 Dec 1997 02:21:38 -0600 (CST)
> From: Edward Welbon <welbon@bga.com>
> Subject: Re: dismounting root raid device
>
> For md devices other than raid0, there is probably state that needs to
> be saved that is only known once all writes have completed. Such state
> of course can't be saved to root once it is mounted readonly. In that
> case, you would have to be able to mount a writeable filesystem "X"
> on the readonly root and be able to write to "X" (I recall doing this
> during "rescue" operations, but not as an automated procedure).
>
> The filesystem "X" would presumably be a boot device from which the raid
> (during linuxrc exection via initrd) would pickup it's initial state from.
> Fortunately raid0 isn't required to write out any state (though it would
> be pleasant to be able to write the check sums to mdtab after an mdstop).
> Eventually, I will fiddle with this but it doesn't seem difficult though
> the "devil" is always in the "details".
Yes, that's it.
I had this idea in mind for some time already, but had no time to try it.
Yesterday I did, and it works.
With my RAID1 (mirror), I don't save any checksums or raid superblock data.
I only save an information on the "real" boot partition, that the root md
volume was remounted readonly during shutdown. Then, during boot, the
linuxrc script runs mkraid --only-superblock when it finds this
information; otherwise, it runs ckraid.
This means, that the raid superblock information is not updated during
shutdown; it's updated at the boot time.
It is not very clean, I'm afraid, :-( but it works.
I'm using Slackware and initrd.md by Edward Welbon to boot the root raid
device.
As far as I remember now, the only modified files are
mkdisk and linuxrc, and /etc/rc.d/rc.6 shutdown script.
And lilo.conf, of course.
I'm appending the important parts.
Bohumil Chalupa
--------------- my.linuxrc follows -----------------
#!/bin/sh
# we need /proc
/bin/mount /proc
# start up the md0 device. let the /etc/rc.d scripts get the rest of them
# we should do as little as possible here
# ________________________________________
# root raid1 shutdown test & recreation
# /start must be created on the rd image in my.mkdisk
echo "preparing md0: mounting /start"
/bin/mount /dev/sda2 /start -t ext2
echo "reading saved md0 state from /start"
if [ -f /start/root.raid.ok ]; then
echo "raid ok, modyfying superblock"
rm /start/root.raid.ok
/sbin/mkraid /etc/raid1.conf -f --only-superblock
else
echo "raid not clean, runing ckraid --fix"
/sbin/ckraid --fix /etc/raid1.conf
fi
echo "unmounting /start"
/bin/umount /start
# _________________________________________
#
echo "adding md0 for root file system"
/sbin/mdadd /dev/md0 /dev/sda1 /dev/sdb1
echo "starting md0"
/sbin/mdrun -p1 /dev/md0
# tell kernel we want to switch to /dev/md0 as root device, the 0x900 value
# is arrived at via 256*major_device_number + minor_device number.
echo "setting real-root-dev"
/bin/echo 0x900>/proc/sys/kernel/real-root-dev
# unmount /proc so that the ram disk can be deallocated.
echo "unmounting /proc"
/bin/umount /proc
/bin/echo "We are hopefully ready to mount /dev/md0 (major 9, minor 0) as
root"
exit
--------------- end of my.linuxrc ----------------------------------
----------- extract from /etc/rc.d/rc.6 follows -----------------
# Turn off swap, then unmount local file systems.
echo "Turning off swap."
swapoff -a
echo "Unmounting local file systems."
umount -a -tnonfs
# Don't remount UMSDOS root volumes:
if [ ! "`mount | head -1 | cut -d ' ' -f 5`" = "umsdos" ]; then
mount -n -o remount,ro /
fi
# Save raid state
echo "Saving RAID state"
/bin/mount -n /dev/sda2 /start -t ext2
touch /start/root.raid.ok
/bin/umount -n /start
-------------- end of excerpt from rc.6 ------------------------
------------------ part of my.mkdisk follows ----------------------
#
# now we have the filesystem ready to be populated, we need to
# get a few important directories. I had endless trouble till
# I created a pristine mtab. In my case, it is convenient that
# /etc/mdtab is copied over, this way I can activate md with
# a simple "/sbin/mdadd -ar" in linuxrc.
#
cp -a $ROOT/etc $MOUNTPNT 2>cp.stderr 1>cp.stdout
rm -rf $MOUNTPNT/etc/mtab
rm -rf $MOUNTPNT/etc/ppp*
rm -rf $MOUNTPNT/etc/termcap
rm -rf $MOUNTPNT/etc/sendmail*
rm -rf $MOUNTPNT/etc/rc.d
rm -rf $MOUNTPNT/etc/dos*
cp -a $ROOT/sbin $ROOT/dev $ROOT/lib $ROOT/bin $MOUNTPNT 2>>cp.stderr
1>>cp.stdout
# _____________________________________________________________________
# RAID: will need mkraid and ckraid
cp -a $ROOT/usr/sbin/mkraid $ROOT/usr/sbin/ckraid $MOUNTPNT/sbin
2>>cp.stderr 1>>cp.stdout
# ---------------------------------------------------------------------
# it seems that init wont come out to play unless it has utmp. this can
# probably be pruned back alot. no telling what the real bug was 8-).
#
mkdir $MOUNTPNT/var $MOUNTPNT/var/log $MOUNTPNT/var/run $MOUNTPNT/initrd
touch $MOUNTPNT/var/run/utmp $MOUNTPNT/etc/mtab
chmod a+r $MOUNTPNT/var/run/utmp $MOUNTPNT/etc/mtab
ln -s /var/run/utmp $MOUNTPNT/var/log/utmp
ln -s /var/log/utmp $MOUNTPNT/etc/utmp
ls -lstrd $MOUNTPNT/etc/utmp $MOUNTPNT/var/log/utmp $MOUNTPNT/var/run/utmp
#
# since I wanted to change the mount point, I needed this though
# I suppose that I could have done a "mkdir /proc" in linuxrc.
#
mkdir $MOUNTPNT/proc
chmod 555 $MOUNTPNT/proc
#
# ------------------------------------------------------
# we'll mount the real boot device to /start temporarily
# to check the root raid state saved at shutdown time
#
mkdir $MOUNTPNT/start
# -------------------------------------------------------
#
# need linuxrc (it is, after all, the point of this exercise).
#
if [ -x ./my.linuxrc ]; then
cp -a ./my.linuxrc $MOUNTPNT/linuxrc
chmod 777 $MOUNTPNT/linuxrc
else
ln -s /bin/sh $MOUNTPNT/linuxrc
fi
#
----------------- part of my.mkdisk ends -----------------
8. Appendix B. - complete rc.0 -> rc.6 file
#! /bin/sh
#
# rc.6 This file is executed by init when it goes into runlevel
# 0 (halt) or runlevel 6 (reboot). It kills all processes,
# unmounts file systems and then either halts or reboots.
#
# Version: @(#)/etc/rc.d/rc.6 1.50 1994-01-15
#
# Author: Miquel van Smoorenburg <miquels@drinkel.nl.mugnet.org>
# Modified by: Patrick J. Volkerding, <volkerdi@ftp.cdrom.com>
# Modified by: Michael A. Robinton, <michael@bzs.org> for RAID shutdown
# Set the path.
PATH=/sbin:/etc:/bin:/usr/bin
# Set linefeed mode to avoid staircase effect.
stty onlcr
echo "Running shutdown script $0:"
# Find out how we were called.
case "$0" in
*0)
message="The system is halted."
command="halt"
;;
*6)
message="Rebooting."
command=reboot
;;
*)
echo "$0: call me as \"rc.0\" or \"rc.6\" please!"
exit 1
;;
esac
# Kill all processes.
# INIT is supposed to handle this entirely now, but this didn't always
# work correctly without this second pass at killing off the processes.
# Since INIT already notified the user that processes were being killed,
# we'll avoid echoing this info this time around.
if [ "$1" != "fast" ]; then # shutdown did not already kill all processes
killall5 -15
killall5 -9
fi
# Try to turn off quota and accounting.
if [ -x /usr/sbin/quotaoff ]
then
echo "Turning off quota."
/usr/sbin/quotaoff -a
fi
if [ -x /sbin/accton ]
then
echo "Turning off accounting."
/sbin/accton
fi
# Before unmounting file systems write a reboot or halt record to wtmp.
$command -w
# Save localtime
[ -e /usr/lib/zoneinfo/localtime ] && cp /usr/lib/zoneinfo/localtime /etc
# Asynchronously unmount any remote filesystems:
echo "Unmounting remote filesystems."
umount -a -tnfs &
# you must have issued
# 'cat /proc/mdstat | grep md0 > {your boot vol}/linux/raidgood.ref'
# before linuxrc will execute properly with this info
RAIDSTATUS=`/bin/cat /proc/mdstat | /usr/bin/grep md0 # capture raid status`
# Turn off swap, then unmount local file systems.
# clearing mdtab as well
echo "Turning off swap."
swapoff -a
echo "Unmounting local file systems."
umount -a -tnonfs
# Don't remount UMSDOS root volumes:
if [ ! "`mount | head -1 | cut -d ' ' -f 5`" = "umsdos" ]; then
mount -n -o remount,ro /
fi
# root device remains mounted
# mount dos file systems RW
echo "Writing RAID read-only boot FLAG(s)."
mount -n /dosa
mount -n /dosc
# create raid mounted RO flag in duplicate
# containing the shutdown status of the raid array
echo ${RAIDSTATUS} > /dosa/linux/raidstat.ro
echo ${RAIDSTATUS} > /dosc/linux/raidstat.ro
umount -n /dosa
umount -n /dosc
# Stop all the raid arrays (except root)
echo "Stopping raid"
mdstop -a
# See if this is a powerfail situation.
if [ -f /etc/power_is_failing ]; then
echo "Turning off UPS, bye."
/sbin/powerd -q
exit 1
fi
# Now halt or reboot.
echo "$message"
[ ! -f /etc/fastboot ] && echo "On the next boot fsck will be FORCED."
$command -f